Terrorism has been a constant hindrance on our effort to achieve global peace and prosperity. From hostage situations and hijackings to mass shootings and bombings, terrorist attacks have a profound impact on both the victims and the larger society; they cause physical harm and loss of life, as well as emotional trauma and psychological distress. Needless to say, they can have long-lasting socioeconomic consequences, disrupting trade and commerce, causing job losses, and decreasing investor confidence.
As the frequency of terrorist attacks is increasing at a rate faster than ever, it is crucial to understand them and their trends and patterns. In this blog post, I will be examining various aspects of terrorism including regions, targets, methods, and motives using three open-source datasets.
The first dataset, Global Terrorism Database (GTD), contains information on over 180,000 global terrorist attacks from 1970 to 2017. Similarly, the two other datasets - World, Region, Country GDP and World Bank National Accounts data, includes the data for Gross Domestic Production (GDP), fertility rate and net migration of different countries in the aforementioned period. All the three datasets were retrieved from the popular data science website Kaggle.
I hope this project will shed some light on the phenomenon of global terrorism and will equip us better to combat them in the future. So let’s roll up our sleeves and demystify the world of global terrorism.
Analysis
Code
# Import modulesimport pandas as pdimport numpy as npimport plotly.express as pximport nltkfrom sklearn.metrics import mean_squared_errorfrom sklearn.ensemble import RandomForestRegressorfrom sklearn.tree import DecisionTreeRegressorfrom sklearn import neighborsimport tensorflow as tffrom PIL import Imagefrom tensorflow.keras.models import Sequentialfrom sklearn.model_selection import train_test_splitfrom tensorflow.keras.layers import Dense, Dropout, Conv1D, MaxPooling1D, Flatten, LSTM, SimpleRNNfrom tensorflow.keras.layers import Bidirectional, GRU, UpSampling1Dimport plotly.express as pxfrom sklearn.preprocessing import LabelEncoderfrom wordcloud import WordCloudimport matplotlib.pyplot as pltimport matplotlibimport matplotlib.pyplot as pltimport seaborn as snsfrom matplotlib.lines import Line2Dimport matplotlib.patches as mpatchesimport timeimport warningsimport bar_chart_race as bcrwarnings.filterwarnings("ignore", category=FutureWarning)# Read first databasedf_attacks = pd.read_csv("../data/globalterrorismdb_0718dist.csv", encoding="ISO-8859-1", low_memory=False)df_attacks.head()df_attacks = df_attacks[['eventid','iyear', 'imonth', 'iday', 'country_txt', 'region_txt', 'provstate', 'city', 'latitude', 'longitude', 'suicide', 'attacktype1_txt', 'targtype1_txt', 'gname', 'motive', 'weaptype1_txt', 'nkill']]df_attacks.rename(columns={"eventid": "Event ID", "iyear": "Year", "imonth": "Month", "country_txt": "Country", "region_txt": "Region", "provstate": "Province/State", "city": "City", "latitude": "Latitude", "longitude": "Longitude", "suicide": "Suicide", "attacktype1_txt": "Attack Type","targtype1_txt": "Target Type", "gname": "Terrorist Group", "motive": "Motive", "weaptype1_txt": "Weapon Type", "nkill": "Casualties"}, inplace=True)# Read second databasedf_population = pd.read_csv("../data/population.csv")df_population = df_population[["Country","Year", "Migrants(net)", "FertilityRate"]]df_population.rename(columns= {"FertilityRate": "Fertility Rate", "Migrants(net)": "Migrants (net)"}, inplace=True)# Read third databasedf_gdp = pd.read_csv("../data/world_country_gdp_usd.csv")df_gdp = df_gdp[['Country Name','year', 'GDP_USD']]df_gdp.rename(columns= {"Country Name": "Country", "year": "Year", "GDP_USD":"GDP (in USD)", "GDP_per_capita_USD": "GDP (per capita)"}, inplace=True)# Read database for the population of the USdf_us_population = pd.read_csv("../data/us_population.csv")df_us_population = df_us_population[["state", "pop2022"]]df_us_population.rename(columns= {"state": "State", "pop2022": "Population"}, inplace=True) # Show the terrorist attacks as a scatter animationfig = px.scatter_geo(df_attacks, lon="Longitude", lat="Latitude", animation_frame="Year", color="Region", projection="equirectangular", animation_group="Year", title="Terrorist Attacks (1970 - 2017)")fig.update_layout(title_x=0.44)fig.show()
Figure 1: Global Terrorist Attacks
The animation above shows that there were a significant number of terrorist attacks in the US from 1970 to 2017. It is surprising to see this, especially when we consider the effort the US has made over the past 50 years in tackling terrorism in almost every terrorist-prone country.
Which states had the highest number of terrorist attacks? Lets find out.
Terrorism in the US
Code
# Map US states to their abbreviationsus_state_to_abbrev = {"Alabama": "AL","Alaska": "AK","Arizona": "AZ","Arkansas": "AR","California": "CA","Colorado": "CO","Connecticut": "CT","Delaware": "DE","Florida": "FL","Georgia": "GA","Hawaii": "HI","Idaho": "ID","Illinois": "IL","Indiana": "IN","Iowa": "IA","Kansas": "KS","Kentucky": "KY","Louisiana": "LA","Maine": "ME","Maryland": "MD","Massachusetts": "MA","Michigan": "MI","Minnesota": "MN","Mississippi": "MS","Missouri": "MO","Montana": "MT","Nebraska": "NE","Nevada": "NV","New Hampshire": "NH","New Jersey": "NJ","New Mexico": "NM","New York": "NY","North Carolina": "NC","North Dakota": "ND","Ohio": "OH","Oklahoma": "OK","Oregon": "OR","Pennsylvania": "PA","Rhode Island": "RI","South Carolina": "SC","South Dakota": "SD","Tennessee": "TN","Texas": "TX","Utah": "UT","Vermont": "VT","Virginia": "VA","Washington": "WA","West Virginia": "WV","Wisconsin": "WI","Wyoming": "WY","District of Columbia": "DC","American Samoa": "AS","Guam": "GU","Northern Mariana Islands": "MP","Puerto Rico": "PR","United States Minor Outlying Islands": "UM","U.S. Virgin Islands": "VI",}# Filter all the attacks in the US alonedf_attacks_us = df_attacks[df_attacks["Country"] =="United States"] df_attacks_us = pd.DataFrame(df_attacks_us.groupby("Province/State")["Event ID"].count())df_attacks_us = df_attacks_us.reset_index()df_attacks_us.rename(columns={"Province/State": "State", "Event ID": "Number of Terrorist Attacks"}, inplace=True)df_attacks_us = df_attacks_us[df_attacks_us["State"] !="Unknown"]df_attacks_us["State Code"] = df_attacks_us["State"].apply(lambda x: us_state_to_abbrev[x])# Standardize the terrorism score between 0 and 1def scale_column(df, column, minVal=float('-inf'), maxVal=float('inf')):if minVal ==float('-inf'): minVal =min(df[column])if maxVal ==float('inf'): maxVal =max(df[column]) res = []for val in df[column]: res.append((val - minVal) / (maxVal - minVal))return res# Count the number of terrorist attacks in each US state and standardize the number based on populationdf_attacks_us = df_attacks_us.merge(df_us_population[['State', 'Population']])df_attacks_us["Number of Terrorist Attacks (Standardised)"] = df_attacks_us["Number of Terrorist Attacks"] / df_attacks_us["Population"]tempVal = scale_column(df_attacks_us, "Number of Terrorist Attacks (Standardised)")df_attacks_us["Number of Terrorist Attacks (Standardised)"] = tempValdf_attacks_us = df_attacks_us.sort_values(by="Number of Terrorist Attacks (Standardised)", ascending=False)# Plot the choropleth for terrorist attacks in the US statesfig = px.choropleth(df_attacks_us, locations='State Code', color='Number of Terrorist Attacks (Standardised)', color_continuous_scale="Viridis", locationmode="USA-states", scope="usa", labels={'Number of Terrorist Attacks (Standardised)':'No. of Attacks (Standardised)'}, title="Terrorist Attacks in the US (1970-2017)")fig.update_layout(title_x=0.44)fig.update_layout( legend = {"xanchor": "right", "x": -0, "y":1.9})fig.update_layout(height=500, width=780)fig.show()
Figure 2: Terrorist Attacks in the US
Figure 2 shows different states in the US with a varying number of terrorist attacks, which has been calculated by dividing the total number of terrorist attacks in a given state by its population and standardizing in such a way that the state with the highest score is assigned a value of 1 and the least score is assigned a value of 0. We see that New York, Oregon, California, Washington, and Nebraska are the five most terrorist-prone states in the US and Kentucky, South Carolina, West Virginia, Alaska, and Arkansas are the safest states in terms of the frequency of terrorist attacks.
So what exactly motivates these terrorist groups and has it changed over the last fifty years?
Code
# Create two word clouds showing the motives of the terrorist attacks.# First, download the stopwords and add common words from the motives columnstpwrd = nltk.corpus.stopwords.words('english')extended_list = ["specific", "motive", "unknown", "Unknown", "incident", "claimed", "responsibility", "however", "unaffiliated", "individual", "identified", "killed", "stated", "anti", "attacks", "protest", "carried", "attack", "trend", "larger", "may", "part", "following", "community", "sources", "violence", "targeting", "noted", "posited", "suspected", "targeting", "members", "noted", "targeted", "also", "assailant", "perpetrator", "meant", "bring attention", "practice", "perpetrator", "assailant", "meant", "bring", "attention"]stpwrd.extend(extended_list)# Select all the attacks in the USdf_attacks_us = df_attacks[df_attacks["Country"] =="United States"]df_attacks_us = df_attacks_us[["Year", "Motive"]]df_attacks_us = df_attacks_us.dropna()# Select the subset of the dataset above to only include the years between 1970 and 1999 inclusive.temp_df = df_attacks_us[(df_attacks_us["Year"] >=1970) & (df_attacks_us["Year"] < (2000))]motive =list(temp_df["Motive"].values)motive =" ".join(motive)# Plot the word cloudwordcloud = WordCloud(width=1000, height=800, background_color ='white', stopwords=stpwrd, color_func=lambda*args, **kwargs: "green", min_font_size =10).generate(motive)plt.figure(figsize = (12, 12), facecolor =None) plt.imshow(wordcloud) plt.axis("off")plt.tight_layout(pad =2)plt.title("Attack Motives ("+str(1970) +" - "+str(1999) +")", fontdict={'fontsize': 36})plt.show()# Select the subset of the dataset above to only include the years between 2000 and 2017 inclusive. temp_df = df_attacks_us[(df_attacks_us["Year"] >=2000) & (df_attacks_us["Year"] <= (2017))]motive =list(temp_df["Motive"].values)motive =" ".join(motive)# Plot the word cloudwordcloud = WordCloud(width=1000, height=800, background_color ='white', stopwords=stpwrd, color_func=lambda*args, **kwargs: "purple", min_font_size =10).generate(motive)plt.figure(figsize = (12, 12), facecolor =None) plt.imshow(wordcloud) plt.axis("off")plt.tight_layout(pad =2)plt.title("Attack Motives ("+str(2000) +" - "+str(2017) +")", fontdict={'fontsize': 36})plt.show()
(a) 1970-1999
(b) 2000-2017
Figure 3: Attack Motives in the US
Both the word clouds share a common theme of abortion, suggesting that this has been a prominent topic of discussion and conflict for several decades in the US. However, the wordclouds also differ in significant ways. The first wordcloud, which pertains to the pre-2000 period, reveals issues that were relevant to Puerto Rico, Vietnam, and African American groups. The second wordcloud, which represents the post-2000 period, shows themes that are related to Iraq, ISIL, and Islamic states. These topics tend to indicate that there has been an increase in attacks associated with religion over the past 20 years in the US. This shift in topics aptly reflects the changes in the political landscape both domestically and internationally - from fighting against the spread of communism and racism to battling religiously motivated terrorism.
Global Terrorism
Now that we have analyzed the state of terrorism in the US, how about we move to get its bigger picture? Let’s begin by analyzing how the frequency of terrorist attacks has changed over the last 50 years.
Code
# Count the number of terrorist attacks each year and plot a bar chart.yearly_freq = pd.DataFrame(df_attacks.groupby("Year")["Event ID"].count()).reset_index()yearly_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)fig = px.bar(yearly_freq, x=yearly_freq["Year"], y=yearly_freq["Number of Terrorist Attacks"], title="Frequency of Terrorist Attacks (1970-2017)")fig.update_layout(title_x=0.5)fig.update_layout(height=400)fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.show()
Figure 4: Frequency of Terrorist Attacks
It is clear from Figure 4 that the number of terrorist attacks was at its minimum around the years 1972 and 2003 (it is worth mentioning that the data for 1994 was missing and not 0) and has greatly increased over the last 10 years in the dataset (2007-2017).
So, what parts of the world have experienced the highest number of terrorist attacks?
Code
# Count the number of terrorist attacks in each geographical regions and group them based on target type.region_freq = pd.DataFrame(df_attacks.groupby(["Region", "Attack Type"])["Event ID"].count()).reset_index()region_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)region_freq = region_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)region_freq['Attack Type'] = region_freq['Attack Type'].replace(['Bombing/Explosion', 'Hostage Taking (Kidnapping)', 'Facility/Infrastructure Attack', 'Hostage Taking (Barricade Incident)'], ['Bombing', 'Hostage', 'Facility Attack', 'Hostage (Barr.)'])# Plot the bar chart.fig = px.bar(region_freq, x=region_freq["Region"], y=region_freq["Number of Terrorist Attacks"], color="Attack Type", height=400, title="Terrorist Attacks in Different Regions", barmode="relative")fig.update_layout(title_x=0.5)fig.update_layout(height=500)fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.show()
Figure 5: Terrorist Attacks in different Regions
Figure 5 shows that the Middle East & North Africa, South Asia, and South America were the three most terrorist-prone regions. On the other hand, Australasia & Oceania, Central Asia, and East Asia were the safest regions in terms of terrorism. It is also worth noting that in all the geographical regions, the terrorist groups used bombing and armed assault as the most common form of attacks.
Let’s delve deeper to see which countries from these terrorist-prone regions were contributing the highest number of terrorist incidents.
Code
# Count the number of casualties in each country.df_countries_casualties = pd.DataFrame(df_attacks.groupby(["Country"])["Casualties"].sum().reset_index())df_countries_terrorist_count = pd.DataFrame(df_attacks.groupby(["Country"])["Event ID"].count().reset_index())df_countries_terrorist_count.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)df_merged_casualties_count = df_countries_casualties.merge(df_countries_terrorist_count[["Country", "Number of Terrorist Attacks"]])# Map the country names to their ISO codesdf_iso_codes = px.data.gapminder()[["country", "iso_alpha"]]df_iso_codes.rename(columns={"country": "Country", "iso_alpha": "Country Code"}, inplace=True)df_iso_codes.drop_duplicates(inplace=True)df_iso_codes = df_iso_codes.reset_index()df_iso_codes.drop(["index"], axis=1, inplace=True)df_countries_terrorist_count = df_countries_terrorist_count.merge(df_iso_codes[['Country', 'Country Code']])df_countries_terrorist_count["No. of Attacks"] = df_countries_terrorist_count["Number of Terrorist Attacks"]# Plot a choropleth representing the frequency of attacks in different countries.fig = px.choropleth(df_countries_terrorist_count, locations="Country Code", color="No. of Attacks", hover_name="Country", color_continuous_scale=px.colors.sequential.Plasma, title="Terrorist Attacks (1970 - 2017)")fig.update_layout(title_x=0.5)fig.update_layout(height=500)fig.show()
Figure 6: Countries with the Highest Number of Attacks
Figure 6 shows that Iraq in the Middle East; Afghanistan, Pakistan, and India in South Asia, and Colombia in South America were the most terrorist-prone countries.
The analysis of global terrorism is incomplete without information on terrorist groups. How about we visualize the top 15 most notorious terrorist groups based on the number of casualties from their attacks?
Code
# Count the number of casulaties for different terrorist groupsgroupwise_casualty_freq = pd.DataFrame(df_attacks.groupby("Terrorist Group")["Casualties"].sum()).reset_index()groupwise_casualty_freq = groupwise_casualty_freq.sort_values(by="Casualties", ascending=False)[:16]notorious_groups =list(groupwise_casualty_freq["Terrorist Group"])notorious_groups.remove("Unknown")df_notorious_groups = df_attacks[df_attacks["Terrorist Group"].isin(notorious_groups)]df_notorious_groups = pd.DataFrame(df_notorious_groups.groupby(["Terrorist Group", "Year"])["Casualties"].sum().reset_index())df_notorious_groups["Terrorist Group"] = df_notorious_groups["Terrorist Group"].replace(["Farabundo Marti National Liberation Front (FMLN)", "Islamic State of Iraq and the Levant (ISIL)", "Kurdistan Workers' Party (PKK)", "Liberation Tigers of Tamil Eelam (LTTE)", "New People's Army (NPA)", "Nicaraguan Democratic Force (FDN)", "Revolutionary Armed Forces of Colombia (FARC)", "Shining Path (SL)", "Tehrik-i-Taliban Pakistan (TTP)"], ["Farbundo Liberation", "ISIL", "Kurdistan W.", "Tamil Tigers", "New People's Army", "Nicaraguan Force", "Colombian Force", "Shining Path", "Taliban Pakistan"])# Plot a line chart for the 15 terrorist groups with the highest number of casualtiesfig = px.line(df_notorious_groups, x="Year", y="Casualties", color="Terrorist Group", title='Attacks by different Terrorist Groups')fig.update_layout(title_x=0.5)fig.update_layout(height=500)fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.show()
Figure 7: Attacks by Different Terrorist Groups
One cannot fail to notice the peak in 2001 for Al Qaida’s suicide terrorist attack against the United States (widely known as the 9/11 attack), which is widely taken as the beginning of the rise of other Islamic religious extremist terrorist groups. Another interesting finding: Taliban, Boko Haram, and ISIL, as we can see from the steep lines after 2010 in Figure 7, appear to have killed more people than all the other 12 terrorist groups combined in the last 50 years.
So what exactly do these terrorist groups target? Let’s find out.
Code
# Select the most common targets of the terrorist groups.TOP_N =11target_freq = pd.DataFrame(df_attacks.groupby("Target Type")["Event ID"].count()).reset_index()target_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)rem_freq = target_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[TOP_N:]target_freq = target_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[:TOP_N]target_freq = target_freq[target_freq['Target Type'] !="Unknown"]# Plot a bar chart.fig = px.bar(target_freq, x='Target Type', y='Number of Terrorist Attacks', title="Common Targets of Terrorist Attacks")fig.update_layout(title_x=0.5)fig.update_layout(height=500)fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.show()
Figure 8: Common Targets of Terrorist attacks
Most of the attacks have been targeted toward private citizens & property, the military, and the police. Private citizens & properties are generally the easiest groups to be attacked and also the group with highest population. This might be one possible explanation for such a high number of attacks on them.
Socioeconomic Aspects
Now, let’s change our direction a little bit. We will analyze how terrorism is related to different socioeconomic factors like GDP and fertility rate.
Code
# Maps a country to its geographical regiondef map_region(country): region =list(df_attacks[df_attacks["Country"] == country]["Region"])[0]return region# Find the top 5 countries with the highest number of terrorist attackscountry_freq = pd.DataFrame(df_attacks.groupby("Country")["Event ID"].count()).reset_index()country_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)country_freq = country_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[:10]country_freq["Region"] = country_freq["Country"].apply(map_region)top_five_countries =list(country_freq["Country"].values)[:5]# Count the yearly frequency of terrorist attacks of the top five countries.country_freq_year = pd.DataFrame(df_attacks.groupby(["Year", "Country"])["Event ID"].count().reset_index())country_freq_year = country_freq_year[country_freq_year["Country"].isin(top_five_countries)]country_freq_year.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)# Select only the attacks from the top five countries.df_terrorist_gdp = df_gdp[(df_gdp["Country"].isin(top_five_countries)) & ((df_gdp["Year"] >=1970) & (df_gdp["Year"] <=2017))]df_all_gdp = df_gdp[((df_gdp["Year"] >=1970) & (df_gdp["Year"] <=2017))]df_all_gdp = df_all_gdp.dropna()df_all_gdp = pd.DataFrame(df_all_gdp.groupby("Year").mean().reset_index())df_all_gdp.rename(columns={"GDP (in USD)": "World"}, inplace=True)# Assign a specific color to each of the countries. World will take the black color.colorList =list(px.colors.qualitative.T10)if colorList[0] !="black": colorList.insert(0, "black")for country in top_five_countries: temp_gdp = df_terrorist_gdp[df_terrorist_gdp["Country"] == country] df_all_gdp[country] =list(temp_gdp["GDP (in USD)"])# Plot a line chart showing the GDP of the top five countries and the world.fig = px.line(df_all_gdp, x='Year', y=df_all_gdp.columns[1:], title="GDP of Terrorist-prone Countries", color_discrete_sequence=colorList, labels={"value": "GDP (in USD)","variable": "" })fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.update_layout(title_x=0.5)fig.update_layout(height=400, width=800)fig.show()# Select the fertility rate of the top five countries. df_all_fertility = df_population[(df_population["Year"] >=1970) & (df_population["Year"] <=2017)]df_terrorist_fertility = df_population[(df_population["Country"].isin(top_five_countries)) & ((df_population["Year"] >=1970) & (df_population["Year"] <=2017))]df_all_fertility = df_all_fertility.dropna()df_all_fertility = df_all_fertility.drop(['Migrants (net)'], axis=1)df_all_fertility = pd.DataFrame(df_all_fertility.groupby("Year").mean().reset_index())df_all_fertility.rename(columns={"Fertility Rate": "World"}, inplace=True)for country in top_five_countries: temp_fertility = df_terrorist_fertility[df_terrorist_fertility["Country"] == country] df_all_fertility[country] =list(temp_fertility["Fertility Rate"])# Plot a line chart showing the fertility rate of the top five countries along with the same for the world.fig = px.line(df_all_fertility, x='Year', y=df_all_fertility.columns[1:], title="Fertility Rate of Terrorist-prone Countries", color_discrete_sequence=colorList, labels={"value": "Fertility Rate","variable": "" })fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.update_layout(title_x=0.5)fig.update_layout(height=400, width=800)fig.show()
(a) GDP
(b) Fertility Rate
Figure 9: Socio-economic Aspects of Terrorist-prone Countries
Figure 9 shows the GDP and fertility rate of the aforementioned five-most terrorist-prone countries. We can clearly see from the graphs above that all these countries generally have a lower GDP and higher fertility rate compared to the global average in the given period. India is an exception, having its GDP increase at a faster rate than the global average. Similarly, Colombia is an exception, having its fertility rate below the global average right from the mid-1980s.
Machine and Deep Learning
Figure 10: A Simple Feed Forward Neural Network
Code
# Remove the columns we will not be using for the modeling.try:del df_attacks["Event ID"]del df_attacks["Motive"]del df_attacks["Latitude"]del df_attacks["Longitude"]except:print("Some of the columns are not present")df_attacks = df_attacks.dropna()df_attacks[['Country', 'Region', 'Province/State', 'City', 'Attack Type', 'Target Type', 'Terrorist Group', 'Weapon Type']] = df_attacks[['Country', 'Region', 'Province/State', 'City', 'Attack Type', 'Target Type', 'Terrorist Group', 'Weapon Type']].apply(LabelEncoder().fit_transform)# Split into predictor and response variables.y = df_attacks["Casualties"]X = df_attacks.drop(['Casualties'], axis=1)# Split the data into train (70%), validation (15%), and test (15%) setsX_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.15, random_state=42)X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, test_size=0.20, random_state=42)# Scale the dataset.scaler = RobustScaler()X_train = scaler.fit_transform(X_train)X_test = scaler.fit_transform(X_test)X_val = scaler.fit_transform(X_val)# Neural Networksdef create_bilstm(): model = Sequential() model.add(Bidirectional(LSTM(128, activation='relu', input_shape=(12,1), return_sequences=True))) model.add(Dropout(0.2)) model.add(Bidirectional(LSTM(64, activation='relu'))) model.add(Dropout(0.2)) model.add(Dense(32, activation='relu')) model.add(Dense(1))return modeldef create_ffnn(): model = Sequential() model.add(Dense(128, activation='relu', input_shape=(12,))) model.add(Dropout(0.3)) model.add(Dense(64, activation='relu')) model.add(Dropout(0.2)) model.add(Dense(32, activation='sigmoid')) model.add(Dense(16, activation='tanh')) model.add(Dense(1))return modeldef create_cnn(): model = Sequential() model.add(Conv1D(32, 3, activation='relu', input_shape=(12,1))) model.add(MaxPooling1D(2)) model.add(Conv1D(64, 3, activation='relu')) model.add(MaxPooling1D(2)) model.add(Flatten()) model.add(Dense(64, activation='relu')) model.add(Dense(1))return modeldef create_gru(): model = Sequential() model.add(GRU(64, activation='tanh', input_shape=(12,1))) model.add(Dropout(0.2)) model.add(Dense(32, activation='tanh')) model.add(Dropout(0.2)) model.add(Dense(1, activation='linear'))return model# Result containerresult = []dlModels = {"Feed Forward NN": create_ffnn(), "CNN": create_cnn(), "GRU": create_gru(), "Bi-LSTM": create_bilstm()}# Reshape the train, test and validation sets.X_train_new = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)X_val_new = X_val.reshape(X_val.shape[0], X_val.shape[1], 1)X_test_new = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)# Train the neural networks one at a time.for name, model in dlModels.items(): start_time = time.time() model.compile(optimizer='adam', loss='mse')if name =="Bi-LSTM": model.fit(X_train_new, y_train, epochs=20, batch_size=128, validation_data=(X_val_new, y_val)) y_pred = model.predict(X_test_new)else: model.fit(X_train, y_train, epochs=20, batch_size=128, validation_data=(X_val, y_val)) y_pred = model.predict(X_test) result.append([name, round(np.sqrt(mean_squared_error(y_test, y_pred)), 2), round(time.time() - start_time, 2)])# Train the machine learning models one at a time.mlModels = {"Random Forest": RandomForestRegressor(), "K Neighbors": neighbors.KNeighborsRegressor(), "Decision Trees": DecisionTreeRegressor()}for name, model in mlModels.items(): start_time = time.time() model.fit(X_train, y_train) pred = model.predict(X_test) result.append([name, round(np.sqrt(mean_squared_error(y_test, pred)), 2), round(time.time() - start_time, 2)])# Save the results in a csv file.pd.options.display.float_format ='{:.2f}'.formatresult_df = pd.DataFrame(result, columns=["Model", "Root Mean Squared Error", "Time (in seconds)"])result_df.to_csv("./results.csv")
Let’s take the machine and deep learning algorithms out of our arsenals and tackle the problem of predicting the number of casualties for any given attack based on the date, country, region, state, city, suicidal intent, type, target type, terrorist group, and weapon used in the attack. Such a model can prove invaluable for intelligence groups to assess the severity of potential attacks and prepare for them in the future.
The dataset is split into the train, validation, and test sets in the ratio 70:15:15. The train and validation sets are used during the training phase, and the test set is used for assessing the efficiency of the models based on the time they take and their root-mean-squared (RMS) error.
We use four different deep learning models (Feed Forward Neural Network, Bi-directional Long Short-Term Memory, Convolutional Neural Network, and Gated Recurrent Unit) and three other machine learning models (Random Forest, K-Nearest Neighbors, and Decision Trees) for this problem. Each of the neural networks has between five to seven layers, which are chosen based on their efficiency on the validation set.
# Read the results from the csv.result_df = pd.read_csv("../results/results.csv")result_df = result_df.sort_values(by=['Root Mean Squared Error'])# Plot the data.matplotlib.rc_file_defaults()ax1 = sns.set_style(style=None, rc=None)fig, ax1 = plt.subplots(figsize=(12,6))colors = ["#5D3FD3", "#5D3FD3", "#5D3FD3","#5D3FD3", "#0096FF", "#0096FF", "#0096FF"]# Plot the bar chart and set figure options.sns.barplot(data = result_df, x='Model', y='Root Mean Squared Error', alpha=0.5, ax=ax1, palette=colors)ax1.set_xticklabels(ax1.get_xticklabels(), fontsize=12)ax1.set_xlabel("Models", fontsize=14)ax1.set_ylabel("Root Mean Squared Error", fontsize=14)ax1.set_title("Efficiency of Models", fontsize=16)# Plot the lineplot on the same chart and change the alpha level of the charts.ax2 = ax1.twinx()ax2.set_ylabel("Time (in seconds)", fontsize=14)dl = mpatches.Patch(color="#5D3FD3")ml = mpatches.Patch(color="#0096FF")custom_line = [Line2D([0], [0], color='#0096FF', lw=2), dl, ml]leg = plt.legend(custom_line, ["Time", "DL Models", "ML Models"], loc="upper left")for index, lh inenumerate(leg.legendHandles): if index >0: lh.set_alpha(0.5)sns.lineplot(data =list(result_df["Time (in seconds)"]), marker='o', ax=ax2, color='#0096FF')plt.show()
Figure 11: Efficiency of Models
Feed Forward Neural Network turns out to be the most effective model, achieving an RMS error of 8.68, and Decision Trees is the fastest model, completing prediction in 0.99 seconds. In general, neural networks have a lower RMS error than other machine learning models but they are also slower to train and test than their machine learning counterparts. The lowest RMS error we got was 8.68, which is way higher than the average number of casualties at 2.40. That’s a pretty big difference, so it’s safe to say we still have some work to do before these models are really usable.
Our analysis ends here but in the future, we will explore more variables in the terrorism database along with different other socioeconomic factors and their relationship with terrorist attacks. We will also perform extensive hyperparameter tuning and train more sophisticated models like different variants of FractalNets, ResNets, and XceptionNets on a larger dataset, combining and feature engineering different socioeconomic factors to achieve the lowest possible RMS error score.
More Animations
And before you go, here’s a little treat for your eyes.
Code
# Code for bar chart race for countries with highest number of terrorist attacks.# Yearly data is found for each of the countries.df_countries_pivot = pd.DataFrame(df_attacks.groupby(["Country", "Year"]).count()).reset_index()df_countries_pivot.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)df_countries_pivot = df_countries_pivot.pivot_table(values ='Number of Terrorist Attacks',index = ['Year'], columns ='Country')df_countries_pivot.fillna(0, inplace=True)df_countries_pivot.sort_values(list(df_countries_pivot.columns),inplace=True)df_countries_pivot = df_countries_pivot.sort_index()df_countries_pivot.iloc[:, 0:-1] = df_countries_pivot.iloc[:, 0:-1].cumsum()bcr.bar_chart_race(df = df_countries_pivot, n_bars =10, period_length=1000, sort='desc', title="Countries with the Highest Number of Terrorist Attacks", filter_column_colors=True, filename =None)# Code for bar chart race for terrorist attacks based on geographical regions.df_region_pivot = pd.DataFrame(df_attacks.groupby(["Region", "Year"]).count()).reset_index()df_region_pivot.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)df_region_pivot = df_region_pivot.pivot_table(values ='Number of Terrorist Attacks',index = ['Year'], columns ='Region')df_region_pivot.fillna(0, inplace=True)df_region_pivot.sort_values(list(df_region_pivot.columns),inplace=True)df_region_pivot = df_region_pivot.sort_index()df_region_pivot.iloc[:, 0:-1] = df_region_pivot.iloc[:, 0:-1].cumsum()bcr.bar_chart_race(df = df_region_pivot, n_bars =12, period_length=1000, sort='desc', title="Terrorist Attacks Based on Geographical Regions", filter_column_colors=True, filename =None)# Code for bar chart race for terrorist groups with the highest number of attacks.df_animation_pivot = df_notorious_groups.pivot_table(values ='Casualties',index = ['Year'], columns ='Terrorist Group')df_animation_pivot.fillna(0, inplace=True)df_animation_pivot.sort_values(list(df_animation_pivot.columns),inplace=True)df_animation_pivot = df_animation_pivot.sort_index()df_animation_pivot = df_animation_pivot.drop(columns=["Unknown"])df_animation_pivot.iloc[:, 0:-1] = df_animation_pivot.iloc[:, 0:-1].cumsum() bcr.bar_chart_race(df = df_animation_pivot, n_bars =10, period_length=1000, sort='desc', title="Terrorist Groups with the Highest Number of Attacks", filename =None)
References
Countries in the world by population (2023). Worldometer. Retrieved February 5,
2023, from https://www.worldometers.info/world-population/population-by-country/
Information on more than 200,000 terrorist attacks. Global Terrorism Database.
Retrieved February 5, 2023, from https://www.start.umd.edu/gtd/
Lai, N. T. C. (2023, February 3). Word population (1955-2020). Kaggle. Retrieved February
5, 2023, from https://www.kaggle.com/datasets/nguyenthicamlai/population-2022
Mishinev, T. (2022, September 9). World, region, country GDP/GDP per capita. Kaggle.
Retrieved February 5, 2023, from
https://www.kaggle.com/datasets/tmishinev/world-country-gdp-19602021
National Consortium for the Study of Terrorism and Responses to Terrorism. Global
terrorism database. Kaggle. Retrieved February 5, 2023, from
https://www.kaggle.com/datasets/START-UMD/gtd
World Bank. GDP (current US$). GDP National Accounts. Retrieved February 5, 2023, from
https://data.worldbank.org/indicator/NY.GDP.MKTP.CD
Source Code
---title: "The State of Global Terrorism"subtitle: "An In-Depth Analysis of Trends and Threats"author: "Shreehar Joshi"bibliography: references.bibnumber-sections: falseformat: html: toc: true theme: - cosmo rendering: embed-resources code-fold: true code-tools: true pdf: defaultjupyter: python3---{width=100%}Terrorism has been a constant hindrance on our effort to achieve global peace and prosperity. From hostage situations and hijackings to mass shootings and bombings, terrorist attacks have a profound impact on both the victims and the larger society; they cause physical harm and loss of life, as well as emotional trauma and psychological distress. Needless to say, they can have long-lasting socioeconomic consequences, disrupting trade and commerce, causing job losses, and decreasing investor confidence.As the frequency of terrorist attacks is increasing at a rate faster than ever, it is crucial to understand them and their trends and patterns. In this blog post, I will be examining various aspects of terrorism including regions, targets, methods, and motives using three open-source datasets.The first dataset, [Global Terrorism Database (GTD)](https://www.kaggle.com/datasets/START-UMD/gtd), contains information on over 180,000 global terrorist attacks from 1970 to 2017. Similarly, the two other datasets - [World, Region, Country GDP](https://www.kaggle.com/datasets/tmishinev/world-country-gdp-19602021) and [World Bank National Accounts data](https://data.worldbank.org/indicator/NY.GDP.MKTP.CD), includes the data for Gross Domestic Production (GDP), fertility rate and net migration of different countries in the aforementioned period. All the three datasets were retrieved from the popular data science website [Kaggle](https://www.kaggle.com/). I hope this project will shed some light on the phenomenon of global terrorism and will equip us better to combat them in the future. So let's roll up our sleeves and demystify the world of global terrorism.## Analysis ```{python echo=FALSE}#| label: fig-import#| fig-cap: "Global Terrorist Attacks"# Import modulesimport pandas as pdimport numpy as npimport plotly.express as pximport nltkfrom sklearn.metrics import mean_squared_errorfrom sklearn.ensemble import RandomForestRegressorfrom sklearn.tree import DecisionTreeRegressorfrom sklearn import neighborsimport tensorflow as tffrom PIL import Imagefrom tensorflow.keras.models import Sequentialfrom sklearn.model_selection import train_test_splitfrom tensorflow.keras.layers import Dense, Dropout, Conv1D, MaxPooling1D, Flatten, LSTM, SimpleRNNfrom tensorflow.keras.layers import Bidirectional, GRU, UpSampling1Dimport plotly.express as pxfrom sklearn.preprocessing import LabelEncoderfrom wordcloud import WordCloudimport matplotlib.pyplot as pltimport matplotlibimport matplotlib.pyplot as pltimport seaborn as snsfrom matplotlib.lines import Line2Dimport matplotlib.patches as mpatchesimport timeimport warningsimport bar_chart_race as bcrwarnings.filterwarnings("ignore", category=FutureWarning)# Read first databasedf_attacks = pd.read_csv("../data/globalterrorismdb_0718dist.csv", encoding="ISO-8859-1", low_memory=False)df_attacks.head()df_attacks = df_attacks[['eventid','iyear', 'imonth', 'iday', 'country_txt', 'region_txt', 'provstate', 'city', 'latitude', 'longitude', 'suicide', 'attacktype1_txt', 'targtype1_txt', 'gname', 'motive', 'weaptype1_txt', 'nkill']]df_attacks.rename(columns={"eventid": "Event ID", "iyear": "Year", "imonth": "Month", "country_txt": "Country", "region_txt": "Region", "provstate": "Province/State", "city": "City", "latitude": "Latitude", "longitude": "Longitude", "suicide": "Suicide", "attacktype1_txt": "Attack Type","targtype1_txt": "Target Type", "gname": "Terrorist Group", "motive": "Motive", "weaptype1_txt": "Weapon Type", "nkill": "Casualties"}, inplace=True)# Read second databasedf_population = pd.read_csv("../data/population.csv")df_population = df_population[["Country","Year", "Migrants(net)", "FertilityRate"]]df_population.rename(columns= {"FertilityRate": "Fertility Rate", "Migrants(net)": "Migrants (net)"}, inplace=True)# Read third databasedf_gdp = pd.read_csv("../data/world_country_gdp_usd.csv")df_gdp = df_gdp[['Country Name','year', 'GDP_USD']]df_gdp.rename(columns= {"Country Name": "Country", "year": "Year", "GDP_USD":"GDP (in USD)", "GDP_per_capita_USD": "GDP (per capita)"}, inplace=True)# Read database for the population of the USdf_us_population = pd.read_csv("../data/us_population.csv")df_us_population = df_us_population[["state", "pop2022"]]df_us_population.rename(columns= {"state": "State", "pop2022": "Population"}, inplace=True) # Show the terrorist attacks as a scatter animationfig = px.scatter_geo(df_attacks, lon="Longitude", lat="Latitude", animation_frame="Year", color="Region", projection="equirectangular", animation_group="Year", title="Terrorist Attacks (1970 - 2017)")fig.update_layout(title_x=0.44)fig.show()```The animation above shows that there were a significant number of terrorist attacks in the US from 1970 to 2017. It is surprising to see this, especially when we consider the effort the US has madeover the past 50 years in tackling terrorism in almost every terrorist-prone country. Which states had the highest number of terrorist attacks? Lets find out.### Terrorism in the US```{python}#| label: fig-us#| fig-cap: "Terrorist Attacks in the US"# Map US states to their abbreviationsus_state_to_abbrev = {"Alabama": "AL","Alaska": "AK","Arizona": "AZ","Arkansas": "AR","California": "CA","Colorado": "CO","Connecticut": "CT","Delaware": "DE","Florida": "FL","Georgia": "GA","Hawaii": "HI","Idaho": "ID","Illinois": "IL","Indiana": "IN","Iowa": "IA","Kansas": "KS","Kentucky": "KY","Louisiana": "LA","Maine": "ME","Maryland": "MD","Massachusetts": "MA","Michigan": "MI","Minnesota": "MN","Mississippi": "MS","Missouri": "MO","Montana": "MT","Nebraska": "NE","Nevada": "NV","New Hampshire": "NH","New Jersey": "NJ","New Mexico": "NM","New York": "NY","North Carolina": "NC","North Dakota": "ND","Ohio": "OH","Oklahoma": "OK","Oregon": "OR","Pennsylvania": "PA","Rhode Island": "RI","South Carolina": "SC","South Dakota": "SD","Tennessee": "TN","Texas": "TX","Utah": "UT","Vermont": "VT","Virginia": "VA","Washington": "WA","West Virginia": "WV","Wisconsin": "WI","Wyoming": "WY","District of Columbia": "DC","American Samoa": "AS","Guam": "GU","Northern Mariana Islands": "MP","Puerto Rico": "PR","United States Minor Outlying Islands": "UM","U.S. Virgin Islands": "VI",}# Filter all the attacks in the US alonedf_attacks_us = df_attacks[df_attacks["Country"] =="United States"] df_attacks_us = pd.DataFrame(df_attacks_us.groupby("Province/State")["Event ID"].count())df_attacks_us = df_attacks_us.reset_index()df_attacks_us.rename(columns={"Province/State": "State", "Event ID": "Number of Terrorist Attacks"}, inplace=True)df_attacks_us = df_attacks_us[df_attacks_us["State"] !="Unknown"]df_attacks_us["State Code"] = df_attacks_us["State"].apply(lambda x: us_state_to_abbrev[x])# Standardize the terrorism score between 0 and 1def scale_column(df, column, minVal=float('-inf'), maxVal=float('inf')):if minVal ==float('-inf'): minVal =min(df[column])if maxVal ==float('inf'): maxVal =max(df[column]) res = []for val in df[column]: res.append((val - minVal) / (maxVal - minVal))return res# Count the number of terrorist attacks in each US state and standardize the number based on populationdf_attacks_us = df_attacks_us.merge(df_us_population[['State', 'Population']])df_attacks_us["Number of Terrorist Attacks (Standardised)"] = df_attacks_us["Number of Terrorist Attacks"] / df_attacks_us["Population"]tempVal = scale_column(df_attacks_us, "Number of Terrorist Attacks (Standardised)")df_attacks_us["Number of Terrorist Attacks (Standardised)"] = tempValdf_attacks_us = df_attacks_us.sort_values(by="Number of Terrorist Attacks (Standardised)", ascending=False)# Plot the choropleth for terrorist attacks in the US statesfig = px.choropleth(df_attacks_us, locations='State Code', color='Number of Terrorist Attacks (Standardised)', color_continuous_scale="Viridis", locationmode="USA-states", scope="usa", labels={'Number of Terrorist Attacks (Standardised)':'No. of Attacks (Standardised)'}, title="Terrorist Attacks in the US (1970-2017)")fig.update_layout(title_x=0.44)fig.update_layout( legend = {"xanchor": "right", "x": -0, "y":1.9})fig.update_layout(height=500, width=780)fig.show()```@fig-us shows different states in the US with a varying number of terrorist attacks, which has been calculated by dividing the total number of terrorist attacks in a given state by its population and standardizing in such a way that the state with the highest scoreis assigned a value of 1 and the least score is assigned a value of 0. We see that New York, Oregon, California, Washington,and Nebraska are the five most terrorist-prone states in the US and Kentucky, South Carolina, West Virginia, Alaska, and Arkansasare the safest states in terms of the frequency of terrorist attacks. So what exactly motivates these terrorist groups and has it changed over the last fifty years?```{python}#| label: fig-motives#| layout-ncol: 2#| fig-cap: "Attack Motives in the US"#| fig-subcap: #| - "1970-1999"#| - "2000-2017"# Create two word clouds showing the motives of the terrorist attacks.# First, download the stopwords and add common words from the motives columnstpwrd = nltk.corpus.stopwords.words('english')extended_list = ["specific", "motive", "unknown", "Unknown", "incident", "claimed", "responsibility", "however", "unaffiliated", "individual", "identified", "killed", "stated", "anti", "attacks", "protest", "carried", "attack", "trend", "larger", "may", "part", "following", "community", "sources", "violence", "targeting", "noted", "posited", "suspected", "targeting", "members", "noted", "targeted", "also", "assailant", "perpetrator", "meant", "bring attention", "practice", "perpetrator", "assailant", "meant", "bring", "attention"]stpwrd.extend(extended_list)# Select all the attacks in the USdf_attacks_us = df_attacks[df_attacks["Country"] =="United States"]df_attacks_us = df_attacks_us[["Year", "Motive"]]df_attacks_us = df_attacks_us.dropna()# Select the subset of the dataset above to only include the years between 1970 and 1999 inclusive.temp_df = df_attacks_us[(df_attacks_us["Year"] >=1970) & (df_attacks_us["Year"] < (2000))]motive =list(temp_df["Motive"].values)motive =" ".join(motive)# Plot the word cloudwordcloud = WordCloud(width=1000, height=800, background_color ='white', stopwords=stpwrd, color_func=lambda*args, **kwargs: "green", min_font_size =10).generate(motive)plt.figure(figsize = (12, 12), facecolor =None) plt.imshow(wordcloud) plt.axis("off")plt.tight_layout(pad =2)plt.title("Attack Motives ("+str(1970) +" - "+str(1999) +")", fontdict={'fontsize': 36})plt.show()# Select the subset of the dataset above to only include the years between 2000 and 2017 inclusive. temp_df = df_attacks_us[(df_attacks_us["Year"] >=2000) & (df_attacks_us["Year"] <= (2017))]motive =list(temp_df["Motive"].values)motive =" ".join(motive)# Plot the word cloudwordcloud = WordCloud(width=1000, height=800, background_color ='white', stopwords=stpwrd, color_func=lambda*args, **kwargs: "purple", min_font_size =10).generate(motive)plt.figure(figsize = (12, 12), facecolor =None) plt.imshow(wordcloud) plt.axis("off")plt.tight_layout(pad =2)plt.title("Attack Motives ("+str(2000) +" - "+str(2017) +")", fontdict={'fontsize': 36})plt.show()```Both the word clouds share a common theme of abortion, suggesting that this has been a prominent topic of discussion and conflict for several decades in the US. However, the wordclouds also differ in significant ways. The first wordcloud, which pertains to the pre-2000 period, reveals issues that were relevant to Puerto Rico, Vietnam, and African American groups. The second wordcloud, which represents the post-2000 period, shows themes that are related to Iraq, ISIL, and Islamic states. These topics tend to indicate that there has been an increase in attacks associated with religion over the past 20 years in the US. This shift in topics aptly reflects the changes in the political landscape both domestically and internationally - from fighting against the spread of communism and racism to battling religiously motivated terrorism.### Global TerrorismNow that we have analyzed the state of terrorism in the US, how about we move to get its bigger picture? Let's begin by analyzing how the frequency of terrorist attacks has changed over the last 50 years. ```{python}#| label: fig-frequency#| fig-cap: "Frequency of Terrorist Attacks"# Count the number of terrorist attacks each year and plot a bar chart.yearly_freq = pd.DataFrame(df_attacks.groupby("Year")["Event ID"].count()).reset_index()yearly_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)fig = px.bar(yearly_freq, x=yearly_freq["Year"], y=yearly_freq["Number of Terrorist Attacks"], title="Frequency of Terrorist Attacks (1970-2017)")fig.update_layout(title_x=0.5)fig.update_layout(height=400)fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.show()```It is clear from @fig-frequency that the number of terrorist attacks was at its minimum around the years 1972 and 2003 (it is worth mentioning that the data for 1994 was missing and not 0) and has greatly increased over the last 10 yearsin the dataset (2007-2017). So, what parts of the world have experienced the highest number of terrorist attacks? ```{python}#| label: fig-regions#| fig-cap: "Terrorist Attacks in different Regions"# Count the number of terrorist attacks in each geographical regions and group them based on target type.region_freq = pd.DataFrame(df_attacks.groupby(["Region", "Attack Type"])["Event ID"].count()).reset_index()region_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)region_freq = region_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)region_freq['Attack Type'] = region_freq['Attack Type'].replace(['Bombing/Explosion', 'Hostage Taking (Kidnapping)', 'Facility/Infrastructure Attack', 'Hostage Taking (Barricade Incident)'], ['Bombing', 'Hostage', 'Facility Attack', 'Hostage (Barr.)'])# Plot the bar chart.fig = px.bar(region_freq, x=region_freq["Region"], y=region_freq["Number of Terrorist Attacks"], color="Attack Type", height=400, title="Terrorist Attacks in Different Regions", barmode="relative")fig.update_layout(title_x=0.5)fig.update_layout(height=500)fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.show()```@fig-regions shows that the Middle East & North Africa, South Asia, and South America were the three most terrorist-prone regions. On the other hand, Australasia & Oceania, Central Asia, and East Asia were the safest regions in terms of terrorism. It is also worth noting that in all the geographical regions, the terrorist groups used bombing and armed assault as the most common form of attacks.Let's delve deeper to see which countries from these terrorist-prone regions were contributing the highest number of terrorist incidents.```{python}#| label: fig-countries#| fig-cap: "Countries with the Highest Number of Attacks"# Count the number of casualties in each country.df_countries_casualties = pd.DataFrame(df_attacks.groupby(["Country"])["Casualties"].sum().reset_index())df_countries_terrorist_count = pd.DataFrame(df_attacks.groupby(["Country"])["Event ID"].count().reset_index())df_countries_terrorist_count.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)df_merged_casualties_count = df_countries_casualties.merge(df_countries_terrorist_count[["Country", "Number of Terrorist Attacks"]])# Map the country names to their ISO codesdf_iso_codes = px.data.gapminder()[["country", "iso_alpha"]]df_iso_codes.rename(columns={"country": "Country", "iso_alpha": "Country Code"}, inplace=True)df_iso_codes.drop_duplicates(inplace=True)df_iso_codes = df_iso_codes.reset_index()df_iso_codes.drop(["index"], axis=1, inplace=True)df_countries_terrorist_count = df_countries_terrorist_count.merge(df_iso_codes[['Country', 'Country Code']])df_countries_terrorist_count["No. of Attacks"] = df_countries_terrorist_count["Number of Terrorist Attacks"]# Plot a choropleth representing the frequency of attacks in different countries.fig = px.choropleth(df_countries_terrorist_count, locations="Country Code", color="No. of Attacks", hover_name="Country", color_continuous_scale=px.colors.sequential.Plasma, title="Terrorist Attacks (1970 - 2017)")fig.update_layout(title_x=0.5)fig.update_layout(height=500)fig.show()```@fig-countries shows that Iraq in the Middle East; Afghanistan, Pakistan, and India in South Asia, and Colombia in South America were the most terrorist-prone countries. The analysis of global terrorism is incomplete without information on terrorist groups. How about we visualize the top 15 most notorious terrorist groups based on the number of casualties from their attacks?```{python}#| label: fig-groups#| fig-cap: "Attacks by Different Terrorist Groups"# Count the number of casulaties for different terrorist groupsgroupwise_casualty_freq = pd.DataFrame(df_attacks.groupby("Terrorist Group")["Casualties"].sum()).reset_index()groupwise_casualty_freq = groupwise_casualty_freq.sort_values(by="Casualties", ascending=False)[:16]notorious_groups =list(groupwise_casualty_freq["Terrorist Group"])notorious_groups.remove("Unknown")df_notorious_groups = df_attacks[df_attacks["Terrorist Group"].isin(notorious_groups)]df_notorious_groups = pd.DataFrame(df_notorious_groups.groupby(["Terrorist Group", "Year"])["Casualties"].sum().reset_index())df_notorious_groups["Terrorist Group"] = df_notorious_groups["Terrorist Group"].replace(["Farabundo Marti National Liberation Front (FMLN)", "Islamic State of Iraq and the Levant (ISIL)", "Kurdistan Workers' Party (PKK)", "Liberation Tigers of Tamil Eelam (LTTE)", "New People's Army (NPA)", "Nicaraguan Democratic Force (FDN)", "Revolutionary Armed Forces of Colombia (FARC)", "Shining Path (SL)", "Tehrik-i-Taliban Pakistan (TTP)"], ["Farbundo Liberation", "ISIL", "Kurdistan W.", "Tamil Tigers", "New People's Army", "Nicaraguan Force", "Colombian Force", "Shining Path", "Taliban Pakistan"])# Plot a line chart for the 15 terrorist groups with the highest number of casualtiesfig = px.line(df_notorious_groups, x="Year", y="Casualties", color="Terrorist Group", title='Attacks by different Terrorist Groups')fig.update_layout(title_x=0.5)fig.update_layout(height=500)fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.show()```One cannot fail to notice the peak in 2001 for Al Qaida's suicide terrorist attack against the United States (widely known as the 9/11 attack), which is widely taken as the beginning of the rise of other Islamic religious extremist terrorist groups. Another interesting finding: Taliban, Boko Haram, and ISIL, as we can see from the steep lines after 2010 in @fig-groups, appear to have killed more people than all the other 12 terrorist groups combined in the last 50 years.So what exactly do these terrorist groups target? Let's find out.```{python}#| label: fig-targets#| fig-cap: "Common Targets of Terrorist attacks"# Select the most common targets of the terrorist groups.TOP_N =11target_freq = pd.DataFrame(df_attacks.groupby("Target Type")["Event ID"].count()).reset_index()target_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)rem_freq = target_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[TOP_N:]target_freq = target_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[:TOP_N]target_freq = target_freq[target_freq['Target Type'] !="Unknown"]# Plot a bar chart.fig = px.bar(target_freq, x='Target Type', y='Number of Terrorist Attacks', title="Common Targets of Terrorist Attacks")fig.update_layout(title_x=0.5)fig.update_layout(height=500)fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.show()```Most of the attacks have been targeted toward private citizens & property, the military, and the police. Private citizens & properties are generally the easiest groups to be attacked and also the group with highest population. This might be one possible explanation for such a high number of attacks on them. ### Socioeconomic AspectsNow, let's change our direction a little bit. We will analyze how terrorism is related to different socioeconomic factors like GDP and fertility rate.```{python}#| label: fig-socioeconomic#| layout-nrow: 2#| fig-cap: "Socio-economic Aspects of Terrorist-prone Countries"#| fig-subcap: #| - "GDP"#| - "Fertility Rate"# Maps a country to its geographical regiondef map_region(country): region =list(df_attacks[df_attacks["Country"] == country]["Region"])[0]return region# Find the top 5 countries with the highest number of terrorist attackscountry_freq = pd.DataFrame(df_attacks.groupby("Country")["Event ID"].count()).reset_index()country_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)country_freq = country_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[:10]country_freq["Region"] = country_freq["Country"].apply(map_region)top_five_countries =list(country_freq["Country"].values)[:5]# Count the yearly frequency of terrorist attacks of the top five countries.country_freq_year = pd.DataFrame(df_attacks.groupby(["Year", "Country"])["Event ID"].count().reset_index())country_freq_year = country_freq_year[country_freq_year["Country"].isin(top_five_countries)]country_freq_year.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)# Select only the attacks from the top five countries.df_terrorist_gdp = df_gdp[(df_gdp["Country"].isin(top_five_countries)) & ((df_gdp["Year"] >=1970) & (df_gdp["Year"] <=2017))]df_all_gdp = df_gdp[((df_gdp["Year"] >=1970) & (df_gdp["Year"] <=2017))]df_all_gdp = df_all_gdp.dropna()df_all_gdp = pd.DataFrame(df_all_gdp.groupby("Year").mean().reset_index())df_all_gdp.rename(columns={"GDP (in USD)": "World"}, inplace=True)# Assign a specific color to each of the countries. World will take the black color.colorList =list(px.colors.qualitative.T10)if colorList[0] !="black": colorList.insert(0, "black")for country in top_five_countries: temp_gdp = df_terrorist_gdp[df_terrorist_gdp["Country"] == country] df_all_gdp[country] =list(temp_gdp["GDP (in USD)"])# Plot a line chart showing the GDP of the top five countries and the world.fig = px.line(df_all_gdp, x='Year', y=df_all_gdp.columns[1:], title="GDP of Terrorist-prone Countries", color_discrete_sequence=colorList, labels={"value": "GDP (in USD)","variable": "" })fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.update_layout(title_x=0.5)fig.update_layout(height=400, width=800)fig.show()# Select the fertility rate of the top five countries. df_all_fertility = df_population[(df_population["Year"] >=1970) & (df_population["Year"] <=2017)]df_terrorist_fertility = df_population[(df_population["Country"].isin(top_five_countries)) & ((df_population["Year"] >=1970) & (df_population["Year"] <=2017))]df_all_fertility = df_all_fertility.dropna()df_all_fertility = df_all_fertility.drop(['Migrants (net)'], axis=1)df_all_fertility = pd.DataFrame(df_all_fertility.groupby("Year").mean().reset_index())df_all_fertility.rename(columns={"Fertility Rate": "World"}, inplace=True)for country in top_five_countries: temp_fertility = df_terrorist_fertility[df_terrorist_fertility["Country"] == country] df_all_fertility[country] =list(temp_fertility["Fertility Rate"])# Plot a line chart showing the fertility rate of the top five countries along with the same for the world.fig = px.line(df_all_fertility, x='Year', y=df_all_fertility.columns[1:], title="Fertility Rate of Terrorist-prone Countries", color_discrete_sequence=colorList, labels={"value": "Fertility Rate","variable": "" })fig.update_layout({'plot_bgcolor': 'rgba(0,0,0,0)','paper_bgcolor': 'rgba(0,0,0,0)'})fig.update_layout(title_x=0.5)fig.update_layout(height=400, width=800)fig.show()```@fig-socioeconomic shows the GDP and fertility rate of the aforementioned five-most terrorist-prone countries. We can clearly see from the graphs above that all these countries generally have a lower GDP and higher fertility rate compared to the global average in the given period. India is an exception, having its GDP increase at a faster rate than the global average. Similarly, Colombia is an exception, having its fertility rate below the global average right from the mid-1980s.## Machine and Deep Learning{#fig-nnetwork}```{python}#| eval: false# Remove the columns we will not be using for the modeling.try:del df_attacks["Event ID"]del df_attacks["Motive"]del df_attacks["Latitude"]del df_attacks["Longitude"]except:print("Some of the columns are not present")df_attacks = df_attacks.dropna()df_attacks[['Country', 'Region', 'Province/State', 'City', 'Attack Type', 'Target Type', 'Terrorist Group', 'Weapon Type']] = df_attacks[['Country', 'Region', 'Province/State', 'City', 'Attack Type', 'Target Type', 'Terrorist Group', 'Weapon Type']].apply(LabelEncoder().fit_transform)# Split into predictor and response variables.y = df_attacks["Casualties"]X = df_attacks.drop(['Casualties'], axis=1)# Split the data into train (70%), validation (15%), and test (15%) setsX_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.15, random_state=42)X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, test_size=0.20, random_state=42)# Scale the dataset.scaler = RobustScaler()X_train = scaler.fit_transform(X_train)X_test = scaler.fit_transform(X_test)X_val = scaler.fit_transform(X_val)# Neural Networksdef create_bilstm(): model = Sequential() model.add(Bidirectional(LSTM(128, activation='relu', input_shape=(12,1), return_sequences=True))) model.add(Dropout(0.2)) model.add(Bidirectional(LSTM(64, activation='relu'))) model.add(Dropout(0.2)) model.add(Dense(32, activation='relu')) model.add(Dense(1))return modeldef create_ffnn(): model = Sequential() model.add(Dense(128, activation='relu', input_shape=(12,))) model.add(Dropout(0.3)) model.add(Dense(64, activation='relu')) model.add(Dropout(0.2)) model.add(Dense(32, activation='sigmoid')) model.add(Dense(16, activation='tanh')) model.add(Dense(1))return modeldef create_cnn(): model = Sequential() model.add(Conv1D(32, 3, activation='relu', input_shape=(12,1))) model.add(MaxPooling1D(2)) model.add(Conv1D(64, 3, activation='relu')) model.add(MaxPooling1D(2)) model.add(Flatten()) model.add(Dense(64, activation='relu')) model.add(Dense(1))return modeldef create_gru(): model = Sequential() model.add(GRU(64, activation='tanh', input_shape=(12,1))) model.add(Dropout(0.2)) model.add(Dense(32, activation='tanh')) model.add(Dropout(0.2)) model.add(Dense(1, activation='linear'))return model# Result containerresult = []dlModels = {"Feed Forward NN": create_ffnn(), "CNN": create_cnn(), "GRU": create_gru(), "Bi-LSTM": create_bilstm()}# Reshape the train, test and validation sets.X_train_new = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)X_val_new = X_val.reshape(X_val.shape[0], X_val.shape[1], 1)X_test_new = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)# Train the neural networks one at a time.for name, model in dlModels.items(): start_time = time.time() model.compile(optimizer='adam', loss='mse')if name =="Bi-LSTM": model.fit(X_train_new, y_train, epochs=20, batch_size=128, validation_data=(X_val_new, y_val)) y_pred = model.predict(X_test_new)else: model.fit(X_train, y_train, epochs=20, batch_size=128, validation_data=(X_val, y_val)) y_pred = model.predict(X_test) result.append([name, round(np.sqrt(mean_squared_error(y_test, y_pred)), 2), round(time.time() - start_time, 2)])# Train the machine learning models one at a time.mlModels = {"Random Forest": RandomForestRegressor(), "K Neighbors": neighbors.KNeighborsRegressor(), "Decision Trees": DecisionTreeRegressor()}for name, model in mlModels.items(): start_time = time.time() model.fit(X_train, y_train) pred = model.predict(X_test) result.append([name, round(np.sqrt(mean_squared_error(y_test, pred)), 2), round(time.time() - start_time, 2)])# Save the results in a csv file.pd.options.display.float_format ='{:.2f}'.formatresult_df = pd.DataFrame(result, columns=["Model", "Root Mean Squared Error", "Time (in seconds)"])result_df.to_csv("./results.csv") ```Let's take the machine and deep learning algorithms out of our arsenals and tackle the problem of predicting the number of casualties for any given attack based on the date, country, region, state, city, suicidal intent, type, target type, terrorist group, and weapon used in the attack. Such a model can prove invaluable for intelligence groups to assess the severity of potential attacks and prepare for them in the future.The dataset is split into the train, validation, and test sets in the ratio 70:15:15. The train and validation sets are used during the training phase, and the test set is used for assessing the efficiency of the models based on the time they take and their root-mean-squared (RMS) error. We use four different deep learning models (Feed Forward Neural Network, Bi-directional Long Short-Term Memory, Convolutional Neural Network, and Gated Recurrent Unit) and three other machine learning models (Random Forest, K-Nearest Neighbors, and Decision Trees) for this problem. Each of the neural networks has between five to seven layers, which are chosen based on their efficiency on the validation set.The results are shown in @fig-results.```{python}#| label: fig-results#| fig-cap: "Efficiency of Models"# Read the results from the csv.result_df = pd.read_csv("../results/results.csv")result_df = result_df.sort_values(by=['Root Mean Squared Error'])# Plot the data.matplotlib.rc_file_defaults()ax1 = sns.set_style(style=None, rc=None)fig, ax1 = plt.subplots(figsize=(12,6))colors = ["#5D3FD3", "#5D3FD3", "#5D3FD3","#5D3FD3", "#0096FF", "#0096FF", "#0096FF"]# Plot the bar chart and set figure options.sns.barplot(data = result_df, x='Model', y='Root Mean Squared Error', alpha=0.5, ax=ax1, palette=colors)ax1.set_xticklabels(ax1.get_xticklabels(), fontsize=12)ax1.set_xlabel("Models", fontsize=14)ax1.set_ylabel("Root Mean Squared Error", fontsize=14)ax1.set_title("Efficiency of Models", fontsize=16)# Plot the lineplot on the same chart and change the alpha level of the charts.ax2 = ax1.twinx()ax2.set_ylabel("Time (in seconds)", fontsize=14)dl = mpatches.Patch(color="#5D3FD3")ml = mpatches.Patch(color="#0096FF")custom_line = [Line2D([0], [0], color='#0096FF', lw=2), dl, ml]leg = plt.legend(custom_line, ["Time", "DL Models", "ML Models"], loc="upper left")for index, lh inenumerate(leg.legendHandles): if index >0: lh.set_alpha(0.5)sns.lineplot(data =list(result_df["Time (in seconds)"]), marker='o', ax=ax2, color='#0096FF')plt.show()```Feed Forward Neural Network turns out to be the most effective model, achieving an RMS error of 8.68, and Decision Trees is the fastest model, completing prediction in 0.99 seconds. In general, neural networks have a lower RMS error than other machine learning models but they are also slower to train and test than their machine learning counterparts. The lowest RMS error we got was 8.68, which is way higher than the average number of casualties at 2.40. That's a pretty big difference, so it's safe to say we still have some work to do before these models are really usable. Our analysis ends here but in the future, we will explore more variables in the terrorism database along with different other socioeconomic factors and their relationship with terrorist attacks. We will also perform extensive hyperparameter tuning and train more sophisticated models like different variants of FractalNets, ResNets, and XceptionNets on a larger dataset, combining and feature engineering different socioeconomic factors to achieve the lowest possible RMS error score. ## More AnimationsAnd before you go, here's a little treat for your eyes.```{python}#| eval: false# Code for bar chart race for countries with highest number of terrorist attacks.# Yearly data is found for each of the countries.df_countries_pivot = pd.DataFrame(df_attacks.groupby(["Country", "Year"]).count()).reset_index()df_countries_pivot.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)df_countries_pivot = df_countries_pivot.pivot_table(values ='Number of Terrorist Attacks',index = ['Year'], columns ='Country')df_countries_pivot.fillna(0, inplace=True)df_countries_pivot.sort_values(list(df_countries_pivot.columns),inplace=True)df_countries_pivot = df_countries_pivot.sort_index()df_countries_pivot.iloc[:, 0:-1] = df_countries_pivot.iloc[:, 0:-1].cumsum()bcr.bar_chart_race(df = df_countries_pivot, n_bars =10, period_length=1000, sort='desc', title="Countries with the Highest Number of Terrorist Attacks", filter_column_colors=True, filename =None)# Code for bar chart race for terrorist attacks based on geographical regions.df_region_pivot = pd.DataFrame(df_attacks.groupby(["Region", "Year"]).count()).reset_index()df_region_pivot.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)df_region_pivot = df_region_pivot.pivot_table(values ='Number of Terrorist Attacks',index = ['Year'], columns ='Region')df_region_pivot.fillna(0, inplace=True)df_region_pivot.sort_values(list(df_region_pivot.columns),inplace=True)df_region_pivot = df_region_pivot.sort_index()df_region_pivot.iloc[:, 0:-1] = df_region_pivot.iloc[:, 0:-1].cumsum()bcr.bar_chart_race(df = df_region_pivot, n_bars =12, period_length=1000, sort='desc', title="Terrorist Attacks Based on Geographical Regions", filter_column_colors=True, filename =None)# Code for bar chart race for terrorist groups with the highest number of attacks.df_animation_pivot = df_notorious_groups.pivot_table(values ='Casualties',index = ['Year'], columns ='Terrorist Group')df_animation_pivot.fillna(0, inplace=True)df_animation_pivot.sort_values(list(df_animation_pivot.columns),inplace=True)df_animation_pivot = df_animation_pivot.sort_index()df_animation_pivot = df_animation_pivot.drop(columns=["Unknown"])df_animation_pivot.iloc[:, 0:-1] = df_animation_pivot.iloc[:, 0:-1].cumsum() bcr.bar_chart_race(df = df_animation_pivot, n_bars =10, period_length=1000, sort='desc', title="Terrorist Groups with the Highest Number of Attacks", filename =None)```<iframedata-external="1"src="https://www.youtube.com/embed/qmtzgzcPTbk"></iframe><iframedata-external="1"src="https://www.youtube.com/embed/Fs7OdDxY3sg"></iframe><iframedata-external="1"src="https://www.youtube.com/embed/jvOC-eA_Hzo"></iframe>## References| Countries in the world by population (2023). Worldometer. Retrieved February 5, | 2023, from https://www.worldometers.info/world-population/population-by-country/| Information on more than 200,000 terrorist attacks. Global Terrorism Database. | Retrieved February 5, 2023, from https://www.start.umd.edu/gtd/ | Lai, N. T. C. (2023, February 3). Word population (1955-2020). Kaggle. Retrieved February | 5, 2023, from https://www.kaggle.com/datasets/nguyenthicamlai/population-2022 | Mishinev, T. (2022, September 9). World, region, country GDP/GDP per capita. Kaggle. | Retrieved February 5, 2023, from | https://www.kaggle.com/datasets/tmishinev/world-country-gdp-19602021 | National Consortium for the Study of Terrorism and Responses to Terrorism. Global | terrorism database. Kaggle. Retrieved February 5, 2023, from | https://www.kaggle.com/datasets/START-UMD/gtd | World Bank. GDP (current US$). GDP National Accounts. Retrieved February 5, 2023, from | https://data.worldbank.org/indicator/NY.GDP.MKTP.CD